PCNN: Projection Convolutional Neural Networks
53
as ω and ˆx are bilinear with each other as ω ◦ˆx[k]. In our discrete optimization framework,
the discrete values of convolutional kernels are updated according to their gradients. Taking
Eq. 3.36 into consideration, we derive the update rule for ˆx[k+1] as
ˆx[k+1] = ˆx[k] −η ∂f(ω, ˆx[k])
∂ˆx[k]
= ˆx[k] −ω ◦ηδ[k]
ˆx .
(3.37)
By plugging Eq. 3.37 into Eq. 3.35, we achieve a new objective function or a loss function
that minimizes
||ˆx[k+1] −ω ◦x||,
(3.38)
to approximate
ˆx = ω ◦x, x = ω−1 ◦ˆx.
(3.39)
We further discuss multiple projections, based on Eq. 3.39 and projection loss in (3.34),
and have
min 1
2
J
j
||x −ω−1
j
◦ˆxj||2.
(3.40)
We set g(x) = 1
2
J
j ||x −ω−1
j
◦ˆxj||2 and calculate its derivative as g′(x) = 0, and we have
x = 1
J
J
j
ω−1
j
◦ˆxj,
(3.41)
which shows that multiple projections can better reconstruct the full kernels based on
binaries counterparts.
3.5.4
Projection Convolutional Neural Networks
PCNNs, shown in Fig. 3.12, work using DBPP for model quantization. We accomplish this
by reformulating our projection loss shown in (3.34) into the deep learning paradigm as
Lp = λ
2
L,I
l,i
J
j
|| ˆCl,[k]
i,j
−
W l,[k]
j
◦(Cl,[k]
i
+ ηδ ˆ
Cl,[k]
i,j )||2,
(3.42)
where Cl,[k]
i
, l ∈{1, ..., L}, i ∈{1, ..., I} denotes the ith kernel tensor of the lth convolutional
layer in the kth iteration. ˆCl,[k]
i,j
is the quantized kernel of Cl,[k]
i
via projection P l,j
Ω , j ∈
{1, ..., J} as
ˆCl,[k]
i,j
= P l,j
Ω (
W l,[k]
j
, Cl,[k]
i
),
(3.43)
where
W l,[k]
j
is a tensor, calculated by duplicating a learned projection matrix W l,[k]
j
along
the channels, which thus fits the dimension of Cl,[k]
i
. δ ˆ
Cl,[k]
i,j
is the gradient at ˆCl,[k]
i,j
calculated
based on LS, that is, δ ˆ
Cl,[k]
i,j
=
∂LS
∂ˆ
Cl,[k]
i,j . The iteration index [k] is omitted for simplicity.
In PCNNs, both the cross-entropy loss and projection loss are used to build the total
loss as
L = LS + LP .
(3.44)
The proposed projection loss regularizes the continuous values converging onto ΩN while
minimizing the cross-entropy loss, illustrated in Fig. 4.15 and Fig. 3.25.